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PREFACE 


An  Important  objective  In  automatic  language  data- 
processlng  research  Is  the  development  of  computer  programs 
for  parsing  written  texts.  These  programs  are  useful  both 
In  language  reseaixh  and  In  such  applications  as  machine 
translation.  Indexing,  and  abstracting.  The  present 
Memorandum  Is  one  of  a  series  of  RAND  studies  on  sentence- 
structure  description  In  Russian  texts.  The  Immediate 
objective  Is  to  Improve  the  existing  capability  for  Russlam- 
Engllsh  automatic  tran<f latlon. 
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SOMMARY 


A  problem  frequently  encountered  in  the  automatic 
parsing  of  Russian  texts  Is  the  correct  structuring  of 
prepositional  phrases  In  sentences.  Studies  of  text 
samples  Indicate  that^  when  other  criteria  are  absent, 
the  syntactic  governors  of  prepositions  can  be  determined 
with  a  high  degree  of  accuracy  by  reference  to  the 
relative  position  and  part-of>speech  of  elements  In  the 
clausal  environment. 
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THE  POSITION  OF  PREPOSITIONAL  PHRASES  IN  RUSSIAN 


1.  INTRODUCTION 

One  of  the  primary  goals  of  computational  linguistics 
is  the  development  of  automatic  parsing  programs  for  use 
in  processing  written  texts.  There  is  an  enormous  utility 
for  computer  programs  that  will  produce  structural  des¬ 
criptions  of  sentences  comparable  to  the  descriptions 
produced  by  humans.  Although  both  products  are  admittedly 
Imperfect,  given  the  present  Inadequacies  of  grammatical 
theory,  the  information  generated  in  the  course  of  automatic 
syntactic  analysis  is  of  immediate  use  in  language  study: 
the  parsing  programs  themselves  can  be  improved,  and  a 
"data  base"  is  provided  for  testing  the  theoretical 
principles  underlying  the  program.  The  parsing  routine 
is  a  research  tool  for  the  automatic  assembling  of  facts 
about  the  combinatorial  properties  of  sentence  elements; 
in  particular,  it  is  a  means  of  achieving  specificity  in 
syntactic  description.  (In  addition,  automatic  parsing 
has  practical  applications  in  such  activities  as  machine 
translation.  Indexing,  and  abstracting. ) 

A  parsing  program  must  be  based  on  a  model  of 
language,  however  Imperfect  and  tentative  that  model  may 
be.  The  program  described  In  the  present  paper  Is  based 
on  a  simple  dependency  grammar,  adopted  for  linguistic 
research  In  Russian  at  The  RAND  Corporation  [l].  In  this 
model,  the  structure  of  a  sentence  Is  conceived  as  a 
tree-llke  set  of  relations  among  the  words  In  the  sentence. 
One  word  In  every  clause  Is  said  to  be  Independent  (in  our 
convention,  the  predicate);  except  for  this  Item,  every 
other  word  In  the  clause  "depends"  on  one  and  only  one 
other  word.  (Double  dependency  Is  allowable  In  special 
Instances,  e .g. ,  with  relative  pronouns.)  The  word  on 
which  a  word  depends  Is  said  to  be  Its  "governor";  the 
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latter  term  la  used  merely  as  a  complement  to  "dependent," 
and  does  not  necessarily  correspond  to  the  usage  of  the 
term  In  traditional  grammatical  description.  The  syn¬ 
tactic  relationships  designated  here  by  dependency  Include 
Instances  of  agreement  eind  government  (normally  characterized 
by  flexion  In  Russian),  and  complementation  or  modification 
of  meaning.  This  model  results  In  unique  parsings  for  most 
sentences,  assuming  the  acceptance  of  certain  conventions. 

The  latter  are  useful  when  the  dependency  of  a  word  on  a 
group  of  words  is  Indicated;  here,  it  Is  necessary  to 
select  as  governor  one  word  that  will  represent  the  group. 

The  automatic  parsing  progreun  for  Russian,  as  developed 
at  RAND,  has  been  described  elsewhere  [2].  For  present 
purposes,  we  note  only  that  a  sample  of  some  10,000 
sentences  of  text  from  Russian  physics  Journals  has  been 
subjected  to  the  program;  the  resulting  descriptions  have 
been  verified  and  corrected  by  hximans.  All  dependency 
relations  between  the  word  pairs  In  this  text  sample  are 
recorded  on  magnetic  tape.  Automatic  retrieval  programs 
applied  to  this  "processed"  text  enable  researchers  to 
conduct  distributional  studies  (e.g. ,  through  concordances 
for  specified  words  or  syntactic  constructions). 

The  present  study  deals  with  a  common  but  difficult 
problem  In  machine  parsing:  the  automatic  structuring  of 
prepositional  phrases  In  sentences.  As  with  other  sentence 
elements,  the  preposition  Is  said  to  depend  on  one  other 
word  (its  governor);  In  turn.  It  governs  at  least  one 
other  word  (its  dependent,  or  object).  The  difficulty 
lies  In  the  assignment  of  the  correct  governor  for  the 
preposition.  We  should  stress  the  fact  that  we  are  not 
seeking  to  determine  the  governor  of  the  prepositional 
phrase .  Machine  parsing  proceeds  by  a  comparison  of  the 
respective  morphological  and  syntactic  properties  of  pairs 
of  word- tokens  In  the  sentence;  these  properties  are  stored 
In  the  machine  dictionary,  and  unless  we  create  an 
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unmanageably  large  dictionary  by  storing  prepositional 
phrases,  we  can  parse  only  in  terms  of  word  pairs 
(prepositlon/govemor  and  preposition/dependent) .  Can 
we  construct  a  computer  program  capable  of  accounting  for 
the  numerous  relationships  that  a  preposition  bears  to 
its  syntactic  governor?  Can  we  ignore,  or  can  we  utilize, 
the  "semantic"  information  contained  in  the  prepositional 
phrase,— information  that  the  human  parser  takes  full 
advantage  of?  The  following  discussion  attempts  to  deal 
both  with  these  theoretical  problems  and  with  the 
immediate,  "practical"  problem  of  improving  the  machine 
parsing  program. 

The  main  discussion  will  be  prefaced  by  some  remarks 
on  the  structuring  of  prepositional  phrases  in  our  version 
of  dependency  grammar. 

2.  THE  PREPOSITION  IN  STRUCTURE 

Since  prepositions  are  relational  words,  it  is 
convenient  to  think  of  a  three-term  structure:  G/P/b 
(governor,  preposition,  dependent).  (Other  word-classes 
Involved  in  three- term  structures  are  coordinating  axid 
subordinating  conjunctions,  relative  pronouns,  and  relative 
adverbs.)  The  relation  of  P  to  D  can  be  specified 
morphologically  in  Russian,  and  presents  no  serious  problem 
for  the  machine  parser.  Both  word  order  and  case  of  the  D 
(assuming  flexion)  are  precise  and  obligatory.  In  the 
computer  program  the  task  is  performed  by  a  simple  matching 
of  pairs  of  morphological  codes:  the  dependent  is  said  to 
be  the  first  following  occurrence  whose  codes  match  the 
part-of-speech  and  case  requirement  codes  of  the  P.  In 
rare  instances,  the  machine  is  confronted  with  two  possible 
dependents  for  the  P.  These  exceptions  are  of  two  main 
types:  those  in  which  a  following  occurrence  is  homo¬ 
graphic  (^. ,  K  ETOOO  TOCHNOOO  OPREDELENIYA  NE  POLUCHAEM  - 
"from  this  (an)  exact  definition  we  do  not  obtain"),  and 
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those  In  which  a  nested  structure  is  present  (ETO  ZAVISIT 
or  OPREDELYAEMOGO  S  POMOSHCH'YU  SPEKTROMETRA  ZNACHENIYA  - 
"this  depends  on  the  obtained-with-the-aid-of-a-spectrometer 
value") .  (in  these  examples,  the  P  and  the  two  possible 
Ds  are  underlined.)  Here,  the  mechanical  resolution  of 
the  problem  depends  on  the  correct  structural  description 
given  to  the  whole  clause. 

The  situation  with  respect  to  the  preposition  and 
its  governor  is  far  more  complicated.  The  relative 
position  of  the  two  items  is  not  constant,  and  morphology 
per  se  is  of  no  help.  Traditionally,  Russian  grammarians 
have  used  two  terms  to  describe  this  relation;  government 
and  adjolnment  (prlmykanle) .  It  is  widely  agreed  that 
there  are  substantial  differences  in  the  strength  of  the 
connection  between  P  and  G.  Some  grammarians  have  tried 
to  distinguish  between  "strong"  and  "weak"  government. 

A.  M.  PeshkovsklJ,  for  example,  described  the  weakly 
governed  preposition  as  one  for  which  the  connection  may 
depend  on  such  factors  as  word  order  or  meeinlng;  sometimes 
this  situation  creates  ambiguity  of  meaning,  and  sometimes 
the  prepositional  phrase  may  be  connected  with  the  whole 
clause  [3]*  Strong  government  is  said  to  contrast  In  the 
above  respects,  although  the  strength  of  the  connection 
may  vary  considerably  in  different  word  combinations. 

In  the  case  of  adjolnment,  the  connection  Is  felt  to  be 
weaker  still,  as  with  adverbial  modification. 

The  validity  of  these  distinctions  has  been  seriously 
questioned.  For  example,  the  Academy  of  Sciences'  Grammar 
of  the  Russian  Language  characterizes  weak  government  as 
a  "diffuse  metaphor,"  and  stresses  the  need  for  further 
study  and  greater  precision  in  this  area  of  grammar  [4]. 

We  could  not  agree  more.  Prom  our  point  of  view,  strong 
and  weak  government,  and  adjolnment,  represent  little 
more  than  intuitive  Judgements  of  the  kind  of  syntactic 
connection  that  has  already  been  made.  The  gradations  In 


the  strength  of  the  connection  are  of  little  utility  In 
analysis  Itself,  since  our  Immediate  concern  Is  to  make 
the  connection  (in  our  terms,  to  find  the  G),  not  to 
assess  Its  "quality."  A  separate  treatment  of  this  whole 
problem  Is  planned.  For  the  present,  we  remark  only 
(l)  that  we  propose  to  Identify  all  connections,  rejecting 
any  Implication  that  certain  types  of  connections  are 
less  "important"  thsin  others,  and  (ll)  that  the  frequency 
of  occurrence  In  given  texts  of  G/P  word-pairs  Is  a  useful 
criterion  In  automatic  parsing. 

The  criterion  of  frequency  of  G/P  pairs.  Is,  In  fact, 
a  test  for  the  Intuitive  Judgement  that  strongly  governed 
complements  answer  questions  Implied  or  engendered  by 
the  governor  [5].  If  the  verbs  UEKHAT’  =  "to  leave,"  or 
OTNOSIT*  ”  "to  relate,"  so  to  speak  require  complementation 
of  the  kind  "whence,"  or  "to  what,  or  to  whom,"  then  we 
should  frequently  encounter  these  complements  In  written 
text.  We  have.  In  fact,  retrieved  from  our  physics  text 
a  large  number  of  frequently  occurring  G/P  pairs.  Syntactic 
codes  have  been  assigned  In  the  machine  dictionary  to  both 
members  of  such  pairs;  If,  under  appropriate  conditions, 
the  codes  for  these  two  words  are  matched  during  the 
automatic  parsing  routine,  a  pairing  Is  effected.  (The 
appropriate  conditions  Include  proper  case  for  the  object 
of  the  P,  and  the  existence  of  previously  established 
dependency  pairs  In  the  sentence,  1 .e. ,  "precedence"  [2]). 

The  term  "strong  government"  will  be  used  In  the  present 
discussion  to  Include  the  following  types  of  G/P  pairs: 

(I)  pairs  for  which  the  P  Is  an  obligatory  complement 
(VLIYAT*  NA  -  "to  Influence,"  ZAVISET'  OT  -  "to  depend  on"); 

(II)  pairs  for  which  the  P  Is  a  frequent  complement 
(SRAVNIT'  S  =  "to  compare  with";  ZAVISIMOST'  OT  -  "dependence 
on");  (ill)  pairs  In  which  the  translation  of  the 
preposition  can  be  effected  only  by  Its  association  with 

a  given  G.  Pairs  of  the  latter  type  are  listed  In  a 
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separate  paper  [61;  here,  the  quality  or  strength  of 
connection  between  G  and  P  varies  considerably 
(cf.  BLAGODARIT'  KOGO-TO  ZA  =  "to  thank  someone  for," 
POPRAVKA  NA  «  "correction  for,"  VEROYATNOST'  OT  - 
"probability  of,"  RASPREDELENIE  PO  «  "distribution  with 
respect  to,"  SUMMIROVAT'  PO  =  "to  sum  over,"  RESHIT' 

CHEREZ  =  "to  solve  in  terms  of"). 

The  current  parsing  routine  operates  on  the  principle 
that  two  members  of  a  pre-assigned  GP  pair  will  be  Joined 
in  a  dependency  pair  if  they  co-occur  in  the  same  clause 
(under  certain  conditions).  Our  experience  has  been 
that  the  resulting  syntactic  analysis  is  correct  with 
very  few  exceptions.  The  exceptions  occur  chiefly  in 
structures  that  are  essentially  ambiguous  in  the  Immediate 
context.  Thus,  in  TURISTY  UEKHALI  IZ  MOSKVY  -  "the  tourists 
left  Moscow,"  the  strong  governor  of  IZ  is  the  verb;  the 
situation  becomes  ambiguous,  however,  with  TURISTY  IZ 
MOSKVY  UEKHALI  =  "the  tourists  from  Moscow  left,"  or  with 
UEZD  TURISTOV  IZ  MOSKVY  -  "the  departure  of  the  tourists 
from  Moscow."  Since,  in  such  cases,  the  ambiguity  can  be 
resolved  only  by  reference  to  a  larger  context,  we  can 
only  recognize  that  the  co-occurrence  of  members  of  a  O/P 
pair  la  not  a  guarantee  that  the  connection  can  be 
correctly  established. 

In  running  text,  the  ratio  of  strongly  governed  Ps 
to  all  occurrences  of  Ps  is  rather  low;  in  our  physics 
text,* the  ratio  la  estimated  at  1  to  5  for  approximately 
34^*000  occurrences  of  Ps.  Quantitatively,  the  major  task 
is  the  attachment  of  weakly  governed  or  "adjoined" 
prepositional  phrases  to  the  correct  sentence  element. 

In  this  case,  there  is  no  possibility  of  matching  codes 
for  G/P  pairs. 

So  far  as  the  human  parser  is  concerned,  three 
general  situations  may  be  noted. 

(l)  The  relation  of  P  to  G  is  clear,  or  can  be 
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speclfled  with  a  high  degree  of  probability  (ON  UVIDEL 
KNIQU  NA  STOLE  =  "he  saw  the  book  on  the  table"). 

(ii)  The  relation  of  P  to  Q  is  ambiguous.  This 
situation  is  commonly  found  in  the  freime,  transitive  verb/ 
noun  object/preposltlonal  phrase,  where  the  latter  can 
logically  refer  to  (depend  on)  either  the  verb  or  the 
noun:  ON  NAPISAL  SLOVA  NA  DOSKE  =  "He  wrote  the  words  on 
the  blackboard."  In  some  instances,  the  ambiguity  may 
be  Irreleveint  in  translation  ("I  met  the  man  on  the 
comer").  In  others,  the  structural  description  will 
affect  the  translation:  "I  hit  the  man  for  Nixon,"  "I  hit 
the  man  with  the  ax,"  "I  read  the  letter  to  John."  In 
essence,  this  problem  cannot  be  solved  within  the  micro¬ 
context,  although  the  probabilities  may  vary  for  different 
structuring. 

(ill)  The  relation  of  P  to  G  is  not  specific  and  is 
not  relevant  to  meaning.  Here,  the  relative  position  of 
the  prepositional  phrase  is  often  the  key  to  structural 
description;  a  shift  in  position,  however,  reflects  only 
a  shift  in  emphasis.  For  example,  in  the  following 
sentences,  the  prepositional  phrase  would  probably  be 
connected  with  the  preceding  noun:  "The  temperature  in 
the  room  rose,"  "The  value  for  x  was  determined." 
Intuitively,  we  may  doubt  that  any  essential  change  in 
meemlng  results  if  the  phrase  is  attached  to  the  verb; 
the  latter  structuring  is  possible  in  the  above  examples, 
and  la  probable  when  the  phrase  follows  the  verb:  "The 
temperature  rose  in  the  room, "  "The  value  was  determined 
for  X."  The  point  is  simply  that  in  some  structures  the 
relation  of  0  to  P  is  less  obvious  and  less  dependent 
on  meaning  than  in  others. 

In  dealing  with  the  first  and  last  of  these  three 
situations,  it  is  obvious  that  the  human  parser  can  bring 
to  bear  an  enormous  store  of  knowledge  of  the  appropriate 
(l.e. .  the  probable).  It  is  precisely  tills  kind  of 


-8- 


"semantlc"  Information  that  the  machine  lacks.  One 
despairs  of  embedding  In  a  machine  program  the  Information 
necessary  to  parse  correctly  sentences  of  the  first  type 
("I  read  the  book  In  bed,”  "I  read  the  book  under  the 
dictionary,”  etc.),  or  to  decide  that  the  structuring  Is 
nonessentlal  to  meaning.  If  this  Is  true,  should  we  not 
admit  that  accurate  machine  parsing  Is  an  Impossible  goal? 
In  the  following,  an  estimate  la  given  of  the  magnitude  of 
the  problem  In  running  text;  a  solution  Is  suggested  In 
terms  of  the  preposition  relative  to  other  sentence 
members. 

3.  THE  RELATIVE  POSITION  OF  A  PREPOSITIONAL  PHRASE 

i*rs  '()6VfiR]S6R' 

The  relative  ordering  of  sentence  elements  Is  an 
Important  factor  In  the  redundancy  of  natural  language. 
Although  word  order  Is  freer  In  Russian  than  In  English, 
there  are  a  number  of  severe  restrictions  common  to  both. 

In  many  three-term  structures,  the  relative  ordering  of 
syntactic  elements  la  fixed.  Thus,  when  two  elements 
(a  and  B)  are  Joined  by  a  subordinate  conjunction  or  a 
relative  adverb  (j),  a  maximum  of  two  orderings  are 
permitted:  A/j/B  ("I  know/that/he  la  coming,”  "I  know/ 
where/he  lives”),  or,  for  special  emphasis,  J/B/k  ("That/^e 
Is  coming/  I  know,”  "Where/he  llvea/l  know").  The  other 
four  orderings  of  these  elements  are  Impossible,  e.g. . 
k/B/J  ("I  know/he  Is  coming/that,”  "I  know/he  llvea/where"). 
The  restriction  Is  simply  that  the  order  J/B  be  preserved. 

In  the  case  of  coordinate  conjunctions  and  relative 
pronouns,  only  one  ordering  la  foxmd:  A/j/B  (cf.  "and/ 
men/women,”  "The  man/l  saw/whom").  It  could  be  argued 
that  these  restrictions  are  due  chiefly  to  convention, 
since  they  are  not  necessarily  followed  In  other  langpiages, 
e.g. ,  In  Latin  and  Greek.  At  any  rate,  since  In  our 
dependency  grammar  we  have  agreed  to  consider  prepositions 
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as  elements  In  three- term  structures,  we  may  pose  the 
question  of  the  relative  position  of  the  other  two 
elements.  Within  the  prepositional  phrase,  the  ordering 
of  the  elements  Is  fixed  In  Russian  and  In  English: 
preposltlon/object.  For  the  sake  of  convenience,  we  may 
represent  this  combination  simply  as  P.  Our  Inquiry, 
then.  Is  concerned  with  the  relative  ordering  of  the 
syntactic  governor  of  P. 

At  the  outset.  It  should  be  stressed  that  In  our  text, 
the  governors  of  Ps  are  severely  limited  as  to  part-of- 
speech.  With  trivial  exception,  the  governors  are  verbals 
(predicates  and  participles),  nouns,  and  adjectives.  The 
search  for  the  governor  can  therefore  be  confined  to 
representatives  of  these  word  classes  In  the  clause.  In 
our  text,  adjectives  serve  as  governor  In  less  than  Vf, 
of  all  occurrences  of  Ps;  our  limited  experience  suggests 
that,  with  the  exception  of  strong  government,  a  necessary 
and  sufficient  condition  for  establishing  the  adjective 
as  governor  Is  Its  position  Immediately  preceding  the  P. 
Accordingly,  It  Is  appropriate  to  deal  with  the  Q/P 
problem  In  terms  of  verbals  (V)  and  nouns  (N)  In  the 
clausal  environment  of  the  P. 

Six  possible  orderings  of  N,  V,  and  P  exist: 

(1)  n/pA>  (2)  nA/p.  (3)  pAA*  (^)  pAA»  (5)  v/PA, 

(6)  VA/P*  OP  these  orderings,  only  Nos.  1  and  6  present 
a  problem.  If  the  ordering  described  In  Nos.  2  and  4 
occurs,  our  data  and  the  syntactic  characteristic  known 
as  projectlvlty  tells  us  that  the  V  Is  Q  [?]•  In 
sentences  with  the  sequence  nA/P  ("The  words/were  written/ 
on  the  blackboard")  or  pA/N  ("On  the  blackboard/were  written/ 
the  words"),  the  P  cannot  depend  on  the  N.  If  the  ordering 
described  In  Nos.  3  and  5  occurs,  our  experience  Is  that 
the  V  Is  0;  to  put  It  differently.  In  our  text,  a  P  does 
not  depend  on  a  noun  to  Its  right.  This  conclusion  Is 
based  on  a  sample  of  prepositional  occurrences  In  the 
physics  text: 
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Right-Hand  Governors  of  Prepositions 


p 

Frequency 
of  P 

Occurrences  of 
following  G 

of  Occurrences 
with  following  G 

Non- Predicate 
following  0 

DLYA 

1395 

486 

35St 

m 

H 

0 

DO 

104 

3 

3% 

1 

0 

IZ 

280 

58 

20% 

0 

0 

K 

700 

34 

5% 

3 

0 

NA 

1315 

212 

16% 

2 

2 

0 

259 

3 

1 

0 

OT 

769 

4 

.551; 

0 

0 

PO 

749 

114 

15^ 

0 

0 

S 

505 

38 

1% 

0 

0 

V 

1629 

288 

17% 

_2 

2 

7705 

1240 

13 

4 

The  two  occurrences  of  a  following  noun  governor  for 
the  preposition  "V"  were  special  cases.  Involving  larger 
coordinate  structures  Identifiable  by  punctuation.  The  two 
occurrences  with  the  preposition  "NA"  were  contained  In  the 
clauses;  M  X  OKAZYVAYUT  VLIYANIE  KAKIE-TO  KOMPONENTY  - 
"on  X  exert  an  Influence  certain  components"  and  DAT'  NA 
VOPROS  OTVET;  =  "give  ^  the  question  the  answer; ";  In 
both  of  these  clauses  we  have  Instances  of  strong  govern¬ 
ment,  and.  In  addition,  verb/noun  phrases  that  are  equivalent 
to,  or  transformations  of,  verbs  (exert  an  Influence  =  to 
Influence,  give  an  answer  >  to  answer).  Assuming  we  have 
the  capability  to  recognize  either  type  of  structure,  the 
occurrences  of  a  following  noun  governor  for  a  preposition 
are  reduced  to  zero  In  our  sample. 

The  extremely  small  incidence  of  a  right-hand  noun 
governor  for  Ps  is  probably  characteristic  of  the 
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"scientiflc"  prose  from  which  our  sample  Is  taken.  The 
incidence  may  be  somewhat  higher  In  other  kinds  of  written 
discourse,  both  In  Russian  and  English,  and  Is  certainly 
higher  In  the  spoken  language.  For  example,  with  the 
(English)  expletive,  "there  Is,"  the  inverted  order  PVN 
Is  possible:  "To  this  room  there  are  three  entrances,” 
or  "To  this  question  there  is  no  good  answer.”  It  seems 
likely  that  even  In  Russian  such  Inversions  are  extremely 
rare  except  In  cases  of  strong  government.  (The  use  of 
such  frames  to  test  "strong"  versus  "weak"  government  is 
indicated.)  One  may  apparently  discount  In  written 
Russian  scientific  text  the  possibility  that  a  "weakly 
goveimed"  or  "adjoined"  P  will  precede  its  noun  governor 
(cf.  S  RUZH'EM  CHELOVEKA  UVIDEL  =  "with  the  gun  the  man 
I  saw,"  or  S  RUZH'EM  UVIDEL  CHELOVEKA  »  "with  the  gun  (l) 
saw  the  man.")  In  any  event,  we  presume  In  our  parsing 
program  the  ability  to  recognize  cases  of  strong  govern¬ 
ment;  we  merely  note  here  the  absence  of  right-hand  noun 
governors  of  Ps  In  the  case  of  weak  government  or  adjoin- 
ment. 

To  summarize:  when  P,  N,  and  V  occur  in  conjunction, 
as  they  often  do,  there  are  grounds  for  utilizing  the 
relative  position  of  these  elements  as  a  means  of 
determining  the  syntactic  governor  of  the  preposition; 

(1)  If  the  P  Is  nearer  the  V  than  It  is  to  the  N,  the  V 
is  the  governor  (PVN  and  NVP);  (2)  If  the  N  follows  the  P, 
the  V  la  the  governor  (PNV  and  VPN).  We  can  then  turn  to 
the  two  remaining  sequences  of  these  elements  (NPV  and 
VNP),  again  confining  our  study  to  instances  of  other 
than  strong  government. 

Concordances  for  prepositions  in  the  NPV  and  VNP 
sequences  have  not  yet  been  prepared  for  our  complete 
text.  Fairly  extensive  sampling  indicates  that  the 
sequence  NP  occurs  in  about  one-third  of  all  occurrences 
of  Ps,  and  that  In  such  environments  there  is  a  great 
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tendency  for  the  noun  to  be  the  governor.  Thus,  in  one 
sample  of  39  pages  of  text  written  by  13  different  authors, 
1046  Ps  were  encountered;  of  these,  342  (33^)  occurred  in 
the  sequence  NPV  or  VNP.  Excluding  the  few  Instances  of 
strong  government  by  the  predicate,  the  V  was  the  governor 
In  only  33  Instances  {9^).  The  criterion  of  position 
could  be  used  to  assign  the  N  as  correct  governor  In  nine 
out  of  ten  Instances,  given  the  absence  of  other  criteria. 
(We  stress  the  fact  that  If  a  series  of  Ns  precedes  the  P, 
the  question  still  remains:  which  N  is  governor?  See  the 
discussion  below. ) 

An  exeimlnatlon  of  the  33  instances  in  which  the  V 
governed  the  P  in  the  given  structures  suggests  that 
morphological  criteria  are  Inadequate,  i .e. ,  it  is  not 
sufficient  to  determine  the  part-of-speech  of  words  in 
context.  Leaving  aside  the  question  of  meaning,  it  is 
Important  to  take  into  account  both  the  function  of  the 
prepositional  phrase  Itself  emd  the  function  of  the 
preceding  noun.  Here,  two  facts  may  be  noted:  (1)  the 
prepositional  phrase  Itself  sometimes  served  an  adverbial 
function,  so  that  its  dependency  on  the  V  (rather  than  on 
the  N)  Is  strongly  suggested  or  required.  For  example, 
in  the  fragment,  ETA  LINIYA  V  DEJSTVITEL'NOSTI  NE 
YAVLYAETSYA. . .  =  "this  line  in  reality  is  not,"  the  phrase 
V  DEJSTVITEL'NOSTI  =  "in  reality"  serves  an  adverbial 
function  similar  to  the  function  of  the  adverb 
DEJSTVITEL'NO  =  "really."  This  fact  should  prohibit  its 
being  connected  with  the  preceding  noun,  LINIYA  =  "line." 
Other  prepositional  phrases  serving  an  adverbial  function 
Included  V  PRINTSIPE  =  "in  principle,"  V  SREDNEM  =  "on  the 
average,"  V  MINIMUME  =  "at  the  minimum,"  S  TOCHNOST'YU  = 
"with  an  accuracy,"  and  V  DVA  RAZA  =  "twice."  (2)  The 
preceding  N  sometimes  serves  an  adverbial  function,  either 
as  the  final  element  of  a  prepositional  phrase  or  as  an 
Instrumental  noun  dependent  of  the  verb:  USPEKHI  BYLI 
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DOSTIGNUTY  V  POSLEDNEE  VREMYA  V  IZUCHENII  =  "successes 
were  achieved  recently  in  the  study."  Here  too,  VREMYA  in 
V  POSLEDNEE  VREMYA  =  "recently"  could  be  excluded  as  a 
potential  governor  of  the  P,  clearing  the  way  for 
consideration  of  the  verb  as  the  governor. 

It  is  not  clear  that  an  adverbial  function  can  be 
assigned  a  priori  to  certain  prepositional  phrases, 
regardless  of  context.  The  question  requires  further 
study.  If  these  phrases  can  be  treated  as  fixed 
combinations  with  a  fixed  syntactic  function,  the  33 
exceptions  referred  to  above  would  be  reduced  to  15  in 
our  seimple,  1  .e. ,  to  1.4j^  of  the  total  occurrences  of  Ps 
or  4^  of  the  cases  of  N/P.  The  following  are  typical 
constructions  for  which  no  parsing  solution  is  offered; 

POLE  PORMIRUET  KARTINU  NA  EKRANE  =  "the  field  forms  a 
picture  on  the  screen. "  SVET  PROPUSKAETSYA  CHEREZ  TRUBKU 
S  PARAMI  NATRIYA  =  "light  is  passed  through  the  tube  with 
sodium  vapors."  EKSPERIMENTOV  PROVEDENNYKH  V  LABORATORII 
PO  VISUALIZATSII  =  "experiments  conducted  in  the  laboratory 
on  the  visualization  (of),"  RASSEYANIE  IZUCHALOS'  V  RABOTE  [l] 
DLYA  MISHENEJ  =  "scattering  was  studied  in  paper  [l]  for 
targets. "  Some  of  these  structures  are  inherently 
ambiguous;  the  majority  require  the  application  of 
semantic  criteria,  including  an  understanding  of  the 
subject  matter  (i.e.,  there  are  constructions  that  only  a 
physicist  can  parse  confidently  and  correctly). 

The  point  of  the  preceding  discussion  is  that 
constructions  difficult  or  impossible  to  parse  automatically 
are  encountered  infrequently  in  running  text.  To  the  writer, 
this  conclusion  was  unexpected. 

The  major  problem  remaining  is  the  selection  of  the 
correct  noun  governor  from  a  series  of  nouns  preceding 
the  P.  Constructions  of  this  type  (N.../N/P)  constitute 
something  less  than  half  of  the  occurrences  of  N/P;  the 
first  noun  in  the  series  is  of  course  modified  by  a  string 
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of  nouns,  normally  In  the  genitive  case,  occasionally  In 
the  Instrumental  or  dative  case.  Assuming  the  absence 
of  strong  govenment,  which  noun  shall  be  chosen  as 
governor  of  the  following  P? 

To  this  author's  knowledge,  the  only  extensive, 
programmatic  solution  of  this  problem  Is  that  advanced 
by  Shelimova  [8].  In  this  system,  Russian  Ps  are  divided 
Into  three  main  types:  those  that  (in  our  terminology) 
cannot  depend  on  a  noun,  those  that  can  depend  only  on  a 
noun  that  Is  a  deverbative,  and  those  that  can  depend 
on  any  noun.  The  latter  group,  which  Includes  all  the 
frequent  Ps,  Is  divided  into  seven  sub-groups;  for  each 
sub-group  various  criteria  are  established  for  specification 
of  the  G.  These  criteria  Include  the  deverbative  character 
of  members  of  the  preceding  noun  series,  the  deverbative 
character  of  the  dependent  of  the  P,  the  nominative  case 
of  a  preceding  noun,  and  the  "spatial"  significance  of 
preceding  Ns  or  of  a  V  In  the  sentence.  For  example, 
with  the  preposition  "S",  the  deverbative  N  In  the 
preceding  N  series  is  chosen  as  the  G:  PRIMENENIE  ETOGO 
KRITERIYA  S  OSTOROZHNOST 'YU  VPOmE  DOPUSTIMO  »  "the 
application  of  this  criterion  with  care  la  completely 
admissible."  For  the  preposition  "PRI,"  In  the  absence 
of  a  preceding  deverbative  N  and  given  a  deverbative  N 
object  of  the  P,  the  predicate  la  chosen  as  G:  CHITATEL' 
VSTRETIT  MNOGO  DRUGIKH  PRIMEROV  TAKOGO  RODA  PRI  IZUCHENII 
RAZLICHNYKH  OTDELOV  MATEMATIKI  »  "the  reader  will  meet 
many  other  examples  of  this  kind  In  the  study  of  different 
branches  of  mathematics." 

Shelimova  properly  disclaims  Infallibility  for  her 
program,  remarking  occasionally  that  a  given  routine  will 
be  correct  more  often  than  not.  In  some  Instances,  no 
solution  is  obtained.  Although  the  present  author  Is  not 
in  agreement  with  some  of  the  classifications  given  by 
Shelimova,  and  has  noted  a  number  of  errors  that  would 
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result  from  the  application  of  her  program,  he  agrees 
that  the  principles  advanced  are  valid  and  generally 
useful.  The  whole  problem,  one  of  the  most  complicated 
in  automatic  parsing,  deserves  special  study.  For  the 
present,  we  can  only  suggest  the  following;  (l)  the 
volume  Euid  complexity  of  the  task  requires  that  automatic 
methods  be  used  in  the  compilation  of  these  constructions 
from  written  text  and  in  the  building  of  variously  sorted 
concordances;  (ll)  further  knowledge  of  the  syntactic 
function  of  word  combinations  is  required,  including  a 
better  understsmdlng  of  noun/genltlve  novin  combinations 
and  of  the  differing  functions  of  prepositional  phrases. 

In  other  words,  this  is  not  a  special,  isolated  problem, 
but  one  that  waits  upon  the  further  accumulation  of 
information  about  synt^uc  and  its  relation  to  meaning. 

(What,  for  Instance,  is  the  meeinlng  of  the  fact  that  certain 
nouns  cannot  govern  certain  prepositions?)  It  is  most 
doubtful  that  a  satisfactory  solution  to  our  problem  can 
be  obtained  without  this  further  knowledge. 

4.  CONCLUSIONS 

Samplings  of  parsed  Russian  text  indicate  that  the 
relative  position  of  prepositions  and  their  potential 
syntactic  governors  is  a  useful  criterion  in  automatic 
parsing  programs.  Assuming  that  routines  exist  to 
account  correctly  for  strong  government  of  prepositions 
and  for  the  coordination  of  sentence  elements,  the 
following  principles  can  be  incorporated  in  an  automatic 
parsing  prograim: 

(1)  The  search  for  governors  of  Ps  will  be  limited 

to  nouns,  adjectives,  and  predicates  (including  participles) 
in  the  clause. 

(2)  If  the  P  is  clause-initial,  the  governor  is  the 
predicate.  (No  exceptions  to  this  rule  were  observed.) 
"Non-initial"  elements,  for  this  purpose.  Include 


introductory  words  and  phrases  used  as  sentence  modifiers, 
and  clause  markers  (coordinate  and  subordinate  conjunctions, 
relative  adverbs,  etc.)* 

(3)  If  the  P  immediately  follows  a  predicate,  a 
participle,  or  an  adjective,  the  preceding  occurrence  is 
the  governor.  (No  exceptions  to  this  rule  were  observed. ) 

(4)  If  the  P  immediately  follows  a  noun,  this  noun 
(or  one  of  the  nouns  in  a  preceding  sequence  of  nouns)  is 
the  governor  in  approximately  90jt  of  such  occurrences. 

The  n\imber  of  exceptions  to  this  rule  can  be  halved  if  it 
can  be  established  by  other  means  that  an  adverbial  function 
is  being  served  by  the  prepositional  phrase  itself,  or  by 
an  Immediately  preceding  prepositional  phrase  (which 
terminates  with  a  noun). 

The  adequacy  of  these  miles  will  be  tested  for  a  larger 
text  sample.  No  solution  is  here  offered  to  the  problem  of 
choosing  the  correct  noun  governor  from  the  potential 
governors  in  a  preceding  sequence  of  nouns.  In  general, 
our  experience  with  running  text  suggests  that  the  relative 
position  of  sentence  elements  is  a  much  more  significant 
factor  in  structuring  prepositional  phrases  than  la  strong 
government.  A  strong  governor  is  almost  always  discoverable 
by  one  of  the  four  rules  given  above;  information  about  the 
strength  or  quality  of  the  connection  la  in  some  sense 
redund£int.  Word  order  is  also  a  powerful  factor  in 
establishing  weaker  "semantic"  connections  between  the 
unit  of  the  prepositional  phrase  and  its  syntactic  governor. 
The  prospects  for  successful  automatic  parsing  are  greatly 
Increased  by  the  strong  tendency  of  writers  of  "informational" 
texts  to  adhere  to  word  order  norms. 
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